Diagnosing data corruption

Computer servers store the data for all the videos, games and information accessed online. Krishnendu Chakrabarty, the Fulton Professor of Microelectronics, has been given the Driving Innovation in SDC Mitigation Award from the Open Compute Project Foundation to develop quality assurance testing that will improve the reliability and function of data servers. Photo by Erika Gronek/ASU
You are in your doctor’s office for your annual physical and you notice the change. This year, your doctor no longer has your health history in five-inch stack of paperwork fastened together with twin prongs in a creased and fading manila folder. Your primary care physician has finally gone digital and accesses all your medical history on a smartphone or a tablet.
The doctor’s office has gone digital and that makes you feel better for a minute. Until you remember that security breach at your favorite retail store that exposed your debit card data. What if some error or accident puts at risk all the information holding your medical data for the past 15 years? Because you are not the only patient with information being stored here — you are one of thousands.
About this story
There's a reason research matters. It creates technologies, medicines and other solutions to the biggest challenges we face. It touches your life in numerous ways every day, from the roads you drive on to the phone in your pocket.
The ASU research in this article was possible only because of the longstanding agreement between the U.S. government and America’s research universities. That compact provides that universities would not only undertake the research but would also build the necessary infrastructure in exchange for grants from the government.
That agreement and all the economic and societal benefits that come from such research have recently been put at risk.
Learn about more solutions to come out of ASU research at news.asu.edu/research-matters.
Surely, somebody is making sure that won’t happen, right? Yes, thanks to university researchers working with industry.
As the demand for data usage and storage grows exponentially, its scale, capabilities and applications have grown in tandem. All data must be stored in a physical location, either within the device itself or at a remote data center that the device accesses
“In the old days, one defective part per million was considered great because that meant a system might fail maybe once in three years,” says Krishnendu Chakrabarty, the Fulton Professor of Microelectronics in the School of Electrical, Computer and Energy Engineering, part of the Ira A. Fulton Schools of Engineering at ASU. “But now, one cloud data center can have up to a million of these servers running simultaneously, meaning at least one of those parts could fail at any given moment.”
Several large tech companies came together to address this issue by establishing the Open Compute Project Foundation in 2011. The consortium supports collaboration between foundries and university researchers to develop innovative data center design solutions.
To improve quality control testing of silent data corruptions, or SDCs, Chakrabarty has been given the Driving Innovation in SDC Mitigation Award from the Open Compute Project Foundation to develop modeling through generative artificial intelligence, or AI, techniques.
“A ‘one in a million’ failure is far too many,” he says.
A single data center can contain hundreds of thousands of servers. Most cloud computing is performed via servers, with each server having complex microprocessor chips equipped with semiconductor technologies.
In recent years, cloud data centers have faced spontaneous and undetectable chip failure known as silent data corruption, which occurs when a central processing unit inadvertently causes errors while processing data.
Most SDCs are not yet traceable by software, meaning data is often processed incorrectly and lost without any indication of the cause, Chakrabarty says.
Troubleshooting testing methodology
In industries such as health care, security and finances, cloud data is managing an unfathomable amount of confidential user data. Chakrabarty and his team at the ASU Center for Semiconductor Microelectronics, or ACME, are working to secure the data by troubleshooting the issue at its source.
During quality assurance testing, the chip is evaluated for its ability to accurately perform tasks with known solutions, similar to asking, “What is two plus two?” Chips that respond correctly, such as answering “four,” are then approved, packaged and shipped to vendors.
Since SDCs do not occur when testing the chip in a standalone setting, Chakrabarty is collaborating with Intel and ARM to enact functionality testing in earlier stages to simulate the environment in which the chips operate during use. In doing so, he will redesign the system to incorporate onboard sensors that check and redact errors as they occur.
Beyond detecting underperforming chips, Chakrabarty plans to develop a machine learning AI algorithm to understand the cause of the failures and identify which stimuli or sequences of inputs lead a system to fail.
“The goal is not just to throw out the bad parts, but to see if we can learn from that and use that information as feedback to improve the chip manufacturing process,” Chakrabarty says. “We want to be able to go back to the fabricator and say, ‘look, this step in production needs to be adjusted’ so the foundry can make the necessary changes and improve the yield.”
Farshad Firouzi, an electrical engineering research scientist in the School of Electrical, Computer and Energy Engineering and a collaborator on the project with Chakrabarty, says the method is a breakthrough.
“Using large language models is a new approach, and we are among the first to apply this new technology in this context,” Firouzi says.
ACME is welcoming a full-time employee from Google this fall who will study to earn an electrical engineering doctoral degree and specialize in the work ACME is developing.
Matchmaking microelectronic masterpieces
Chakrabarty is among researchers at other renowned engineering universities, such as Carnegie Mellon and Stanford, being awarded support for innovative approaches in the use of AI to detect and diagnose the cause of SDCs.
“We are competing with the best of the best,” he says. “It’s a lot of pressure, but I enjoy the responsibility.”
Chakrabarty has a longstanding history of collaborating with major foundries and government agencies to deliver important research results. He is currently the chief technology officer at the Southwest Advanced Prototyping Hub, or the SWAP Hub, an ASU-led and U.S. Department of Defense-funded consortium geared to developing an ecosystem for advancing the prototyping, fabricating and packaging of microelectronics.
Chakrabarty facilitates connections at the SWAP Hub for more than 150 small businesses, universities and large companies, as well as countless stakeholders. He describes his work as an exercise in matchmaking and notes that the interdisciplinary and collaborative environment has enhanced his perspective toward his own work.
“Some days I feel like I’m back in graduate school because I’m learning new things every day,” he says. “Researchers can get very narrowly focused on their research topics, but in the SWAP Hub, I get to learn how different technologies cater to multiple disciplines and can compare different approaches. It’s a lot of fun.”
As Chakrabarty gears up to conduct the OCP-sponsored research, he is looking to recruit more students to work in the ACME Center, which he describes as a large ecosystem featuring a global perspective that emphasizes education and growth.
“Any student who is doing research with us in ACME will get a chance to work with industry, apply research and see that research being used in meaningful ways.”
More Science and technology

Science meets play: ASU researcher makes developmental science hands-on for families
On a Friday morning at the Edna Vihel Arts Center in Tempe, toddlers dip paint brushes into bright colors, decorating paper fish. Nearby, children chase bubbles and move to music, while…

ASU water polo player defends the goal — and our data
Marie Rudasics is the last line of defense.Six players advance across the pool with a single objective in mind: making sure that yellow hydrogrip ball finds its way into the net. Rudasics, goalkeeper…
Large-scale study reveals true impact of ASU VR lab on science education
Students at Arizona State University love the Dreamscape Learn virtual reality biology experiences, and the intense engagement it creates is leading to higher grades and more persistence for biology…